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multiple convolution layers that apply filters to the input image to extract 


features such as edges, shapes, and patterns. Batch normalization layers that 
Keywords: normalize the output of the convolution layers to accelerate the learning 
process and prevent overfitting follow these convolution layers. The 
performance of the proposed CNN model was evaluated on publicly available 
datasets of skin lesion images, and the findings showed that it outperformed 
Melanoma several state-of-the-art methods for melanoma classification. The authors also 
Skin cancer conducted ablation studies to analyze each layer’s contribution to the model’s 
overall performance. The proposed DL approach has the potential to assist 
dermatologists in the early detection of MSC, which can lead to treatment that 
is more effective and improves patient outcomes. It also demonstrates the 
effectiveness of DL techniques for medical image analysis and highlights the 
importance of carefully designing and optimizing CNN models for high 
performance. The accuracy of the proposed system is 99.99%. 
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1. INTRODUCTION 

Skin cancer is among the most frequent cancers. It is the main cause of death worldwide [1]. Cancer 
has been brought on by environmental changes today [2]. Ultraviolet (UV) rays, for instance, are a major risk 
factor for skin cancer. Fitzpatrick proposed a scale from I to VI in 1975. According to the skin kind and its 
interaction with UV rays, the first type is very light skin and is more likely to develop some skin cancer. The 
sixth type is dark brown, strongly pigmented skin, and less effective. Therefore, this type of cancer is more 
common in countries with light skin. In recent decades, skin cancer incidence has climbed dramatically in the 
US, Europe, and Australia. Skin cancer affects one million Americans yearly, up to over half of all cancers. 

It is possible to see that the skin has two primary layers when dissecting: the epidermis, the outermost 
and most visible layer, and the dermis, which is the innermost and least visible layer. The epidermis has two 
major parts: squamous (flat) and basal cells (round). Mesodermal melanocytes, pigment cells that create 
melanin, make up the lowest fraction of the epidermis. Melanin is the pigment responsible for skin color. In 
direct sunlight, melanocytes produce more pigment, deepening the skin’s melanin and color. This layer of the 
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skin contains lymphatic veins, blood flow, hairs, and glands. The glands in the dermis are separated into two 
types: those that create sweat to assist the body in regulating temperature and those that produce sebum to help 
prevent the skin from getting out. These glands reach the outer layer of the skin through pores, which are very 
small openings on the skin’s surface. 

Furthermore, a quarter of all new malignancies are skin cancers and abnormal growth of skin cells. These 
cells grow without normal control to invade other body parts and multiply to form a mass called a lesion [3]. 
Like many other types of cancer, this type can be fatal if not treated early. It begins as a precancerous lesion. 
It is not malignant but becomes malignant over time [4]. Therefore, early cancer detection is the highest priority 
to save many patients. So, researchers and doctors should fight cancer. 

Skin cancers can be divided into two main types: melanoma skin cancer (MSC) and non-melanoma 
skin cancer (NMSC). NMSC is the most common type of skin cancer and occurs in 2-3 million people at least 
once a year. It is classified into three main types: basal cell carcinoma (BCC) (which accounts for 
approximately 75% of all non-melanoma), squamous cell carcinoma (SCC) (which accounts for approximately 
24% of all non-melanoma), and sebaceous carcinoma (SC) (which accounts for approximately 1% of all 
non-melanoma), among others. 

Melanoma is less common but more serious and aggressive than other skin cancers; it is divided into 
benign and malignant melanoma. A benign melanoma is a simple mole that appears on the skin and is usually 
an evenly colored brown, black, or tan. It can be either round, oval, raised, or flat. In general, benign melanoma 
is less than 6 millimeters. Malignant melanoma is the deadliest type of skin cancer, characterized by bleeding 
sores on the skin. A cancerous growth in a pigmented skin lesion creates it. Malignant melanoma is classified 
into three types: superficial spreading melanoma (which accounts for approximately 75% of all melanomas), 
nodular melanoma (which accounts for approximately 15% of all melanomas), lentigo melanoma (which 
accounts for approximately 10% of all melanomas), and acral melanoma (constitutes about 5% of all 
melanomas). Melanoma is treatable if identified early enough, and the difference between benign skin cancer 
and malignant melanoma is important in determining treatment options [1], [4]. 


2. RELATED WORKS 

Rezvantalab et al. [5] defined eight skin malignancies in 2018. In the collection, there were 10,135 
photos of melanoma and nevi. The structures employed were ResNet 152, inception ResNet v2, and DenseNet 
201. DenseNet 201 had an area under curve (AUC) of 98.16% for melanoma and BCC classification, while 
ResNet 152 had an AUC of 94.40%. 

Rehman et al. [6] used the CNN model and artificial neural network (ANN) to classify the lesion using 
the datasets supplied by ISIC in the 2016 event. Image segmentation was done first by intensity threshold, and 
afterward, feature extraction was done with CNN. The ANN classifier utilized these features to accomplish the 
classification. They reached a 98.32% accuracy, surpassing the previous high of 97%. Charan et al. [7] 
proposed a model that deals with deep learning (DL) in classifying skin lesions. He developed CNN through 
image analysis and applied certain characteristics, such as data augmentation techniques, to treat natural 
imbalances and image preprocessing techniques. The best accuracy achieved by this model could be 0.886. 
Molina-Molina et al. [8] proposed a system based on one-dimensional fractal fingerprints of texture-based 
characteristics mixed with in-depth learning features using Densenet-201 transmission of learning in 2020. Due 
to its prevalence, the groups in the dataset of skin disease images are unbalanced. Utilize the clustering 
technique. K-nearest as classifiers, neighbors and the two basic varieties of support vector machines are 
employed. Voting on multicollinearity was utilized to choose the diagnostic results. This study found that the 
mean precision, sensitivity, and accuracy were all 97.35%, 91.61%, and 66.45% respectively. 

Using training datasets to build a rapid, faster region-based CNN (FRCNN), Jinnai et al. [9] developed 
a technique for identifying skin cancer in 2020. With 86.2% accuracy, FRCNN outperformed board-certified 
dermatologists and dermatology residents. FRCNN has achieved a classification accuracy of 91.5%, a 
sensitivity of 83.3%, and a specificity of 94.55% for two distinct classes (benign or malignant). These results 
indicate that FRCnn has outperformed dermatologists in terms of classification accuracy. 

Pomponiu ef al. [10] used a regular camera to gather 399 images to identify benign nevi from 
melanoma. Preprocessing and data augmentation were carried out first. Pre-trained CNN and AlexNet were 
used to extract high-level features from the samples obtained. The K-nearest neighbor method was employed 
to classify the lesions. With a specificity of 95.18% and a sensitivity rate of 92.1%, they obtained an accuracy 
of 93.62%. 

A total of 129,450 images were used for CNN pretraining by Esteva et al. [11]. There were two basic 
types: benign nevi’s categorization, distinguishing benign dermatitis keratosis from keratinocyte carcinomas. 
They used transfer learning to categorize. In both cases, AUC was 0.96. 
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Pham et al. [12] suggested a method to increase classification utilizing CNN and the data 
augmentation technique in 2018. They also attempted to address the issue of data scarcity and its impact on the 
classifier’s efficiency. There were 600 images for testing and 6162 for training in the datasets. The AUC was 
attained at 89.2%, the ACC at 89.0%, and the AP at 73.9%. They looked at how picture augmentation affected 
three separate classifiers and discovered that they functioned differently and produced better results than the 
usual approaches previously utilized. 


3. DATASETS COLLECTION 

These datasets contain balanced datasets of images of benign and malignant skin moles. The data 
consists of two folders with 1,800 pictures (224x244) of the two types of moles [13]. We divided this database 
into two groups, the first 70% of the data is used in training, and the other 30% is used in the testing process. 
Figure 1 shows samples of skin cancer images from the datasets. 
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Figure 1. A sample of skin cancer images from the datasets 


4. RESEARCH METHOD 
4.1. Preprocessing 

Preprocessing, also known as increasing the quality of a picture to be used further by removing 
desirable image information or noise, is the first step in the detection phase because raw images contain noise. 
The category may contain several inaccuracies if this issue is not addressed properly. This preprocessing is 
required owing to the low contrast between skin lesions and the healthy skin around them, the uneven 
boundaries, and skin artifacts such as hairs, skin lines, and black frames. Filters such as the median filter, mean 
filter, adaptive median filter, Gaussian filter, and adaptive wiener filter can be used to remove Gaussian noise, 
speckle noise, Poisson noise, and salt and pepper noise. For instance, misclassification could result from an 
image with hairs and a lesion. The image noises should be reduced or eliminated by applying preprocessing 
techniques such as skin image conversion to grayscale, Gaussian blur, digital image enhancement, image 
histogram equalization, and resize image. 


4.1.1. Convert RGB to grayscale image 

In some applications, a color image must be converted to a grayscale representation, yet most of 
today’s display and image capture gear can only accept 8-bit images. Furthermore, there is no need to utilize 
more sophisticated and difficult-to-process color pictures because gray-scale images are sufficient for various 
operations. By employing (1), the image is changed from RGB mode to gray [14]: 


GRAY = 0.30R + 0.59G + 0.11B (1) 


4.1.2. Gaussian blur 

A Gaussian blur, sometimes called Gaussian smoothing, results from blurring an image with a 
Gaussian function in image processing. It is a common effect in graphics software often used to shave off some 
of the detail and noise in images. In contrast to the bokeh effect generated by an out-of-focus lens or the shadow 
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of an object in normal lighting, the visual impact of this blurring technique is a smooth blur resembling that of 
viewing the image through a translucent screen. Additionally, computer vision algorithms that improve visual 
structures of various sizes use Gaussian smoothing as a preprocessing step [15] see (2). 


x2+y2 


e 2o? (2) 


G(x, y ) z 2102 
4.1.3. Image histogram equalization 

Histogram equalization is used in the spatial domain to generate an output image with a uniform 
distribution of pixel intensity, resulting in a flattened and extended histogram. This method is commonly used 
for image enhancement due to its simplicity and efficacy, surpassing other traditional methods [16]. The 
simplicity and effectiveness of histogram equalization have led to its widespread use in contrast enhancement 
across various applications, such as medical image and radar signal processing. However, one drawback of 
histogram equalization is that it can alter the brightness of an image due to its histogram flattening property. 
This technique generally improves the overall contrast of multi-images, particularly when an adjacent contrast 
value portrays the relevant data of the image. The calculation of histogram equalization involves the use of 
cumulative distribution functions, which are essential in this process (3). 


Cdf (X) = Xiz hÒ (3) 
Where X represents the gray value and h illustrates the image’s histogram. 


4.1.4. Resize 

One of CNN’s key drawbacks is the requirement to scale images in the datasets to a consistent 
dimension. In this phase, photos are transformed into an array of pixels and then scaled before being fed into 
CNN. Resizing images aims to minimize computational load, speed up the training technique, and generate an 
accurate test model. The images in the two datasets will be resized to 20x20. The preprocessing procedures are 
beneficial because they allow the algorithm to learn and extract information from images [17], [18] readily. 


4.2. Feature extraction approaches 

A high-dimensional data set classification challenge called skin cancer recognition necessitates data 
dimension reduction activities. Skin cancers are classified using these features [19]. The global or “holistic” 
approaches analyze the recognition problem holistically and extract holistic features from skin cancer images. 
The principle component analysis (PCA) [20] goal is taken from the information theory approach, which 
divides skin cancer images into discrete sets of identifying features known as Eigen skin cancer, which is used 
to represent both existing and new skin cancer. The statistical data presented in skin cancer recognition 
technology applying the PCA method reveals the significance of adopting this method for identifying and 
validating skin cancer traits. The goal of PCA is taken from the information theory technique, which divides 
skin images into small sets of distinguishing feature images known as Eigen skin cancer, which is used to 
represent both present and new skin cancer. The 2-dimensional skin cancer image matrices must be converted 
into a 1-dimensional vector by the PCA approach see (4). The 1-dimensional vector can be either a row or 
column vector [21]. 


Average = et trainingimage(n) (4) 


Where M: the total number of images in the training set; p: reflects the mean average; sub: indicates the average 
u that was eliminated. 


5. DEEP LEARNING 
A set of machine learning algorithms and architectures known as DL are characterized by utilizing 
hierarchical layers of nonlinear information processing phases. Most DL structures may be partitioned into 

three classes based on how they are meant to be utilized [22]: 

a) The complex correlation features and joint statistical distributions between the visible data and their 
corresponding classes are what deep generative architectures seek to capture. These architectural designs 
aim to produce new information. The Bayes rule can, however, be used to convert them into 
discriminative models. 
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b) On the other hand, deep discriminatory architectures focus on providing strong discriminatory power for 
pattern classification by characterizing the posterior probability distributions of classes conditioned on 
the visible data. They are specifically designed for discriminative modeling and do not attempt to model 
the underlying generative process of the data. The main goal of deep discriminative architectures is to 
accurately classify data samples into different classes based on the input features. 

c) Hybrid deep architectures, when the aim is discrimination, but the results of generative designs are helped 
(sometimes dramatically) by greater optimization or regularization. 

In the early 1990s, it was believed that training multi-layered networks with backpropagation was 
nearly possible. Still, DL techniques, composed of stacked neural networks, have proven to be powerful tools 
for analyzing big data. DL incorporates a variety of neural networks, the most significant of which is the 
CNN [23]. 

Neural networks with convolutions [24] are an example of a deep discriminative architecture sub-type 
that excels at processing two-dimensional data with grid-like topologies. CNNs are commonly used to process 
photos and movies. Deep 2D CNNs, which have millions of hidden parameters and numerous hidden layers, 
can learn complicated objects and patterns provided they are trained on a sizable ground-truth-labeled visual 
database. Due to this property, they are the main tool for many 2D signal engineering applications. The visual 
cortex inspires the design of the CNN in the human brain, where cells operate as local filters across the input 
space, with more sophisticated cells having bigger receptive fields. 


6. PROPOSED DEEP LEARNING MODEL 

In neural networks, the CNN is one of the main categories for image recognition. It processes an image 
that is provided as input before categorizing it. Keras, a free and open-source DL library, applied the CNN 
model. Each input image will go through fully connected layers (FC), pooling, and convolution layers with 
filters (Kernels) to identify objects with probability values between 0 and 1. CNNs function the same in one, 
two, or three dimensions. The differences are in how the filter, sometimes referred to as a convolution kernel 
or feature detector, traverses the data and the structure of the input data. The following will be used to describe 
and illustrate the parameters of the 1-dimensional CNN layers that were obtained for this work: 

a) 1D convolution layer: recently, 1D convolution neural networks (1D CNNs), an improved form of 2D 
CNNs, have been built. The benefit of 1D CNNs is that they demand little computer power. Since 1D 
CNNs only execute 1D convolutions, their straightforward and compact configuration makes real-time 
and inexpensive hardware implementation possible. A single spatial (or temporal) dimension is input into 
the convolution kernel of a 1D convolution layer to generate a tensor of outputs. 

b) Max pooling: after the convolution layer comes a new pooling layer. Specifically, after a convolution 
layer has applied a non-linearity (e.g., ReLU) to the feature maps output. Utilizing the max-pooling 
method, the maximum output can be obtained. The representation may become invariant to the input 
translations using the pooling process. A max-pooling layer now exists between convolution networks, 
increasing feature and spatial abstractness. Max-pooling determines the highest value for each feature 
map patch for 1D temporal data. The highest value over the window determined by pool size is used to 
down-sample the input representation. Steps cause the window to move. The resulting output, when using 
the “valid” padding option, has a shape of output shape = (input shape — pool size + 1)/strides). 

c) Dense layer: this is the standard layer of a strongly linked neural network. The most popular and 
frequently utilized layer is this one. The dense layer operates the procedure listed on the input before 
returning the result. The number of neurons/units set in the dense layer will influence the output shape. 
Dense acts as in (5): 


Output = activation(dot (input, kernel) + bias) (5) 


The Kernel is the layer’s weights matrix, the bias is the layer’s bias vector, and the activation is the 
element-wise activation function given as the activation input (only relevant if the used bias is true). 

d) Activation function: a neural network’s output is specified by mathematical equations called activation 
functions. Depending on whether each neuron’s input is pertinent to the prediction made by the model, 
the associated function. Furthermore, activation mechanisms maintain the output of each neuron between 
1 and 0 or -1 and 1. The rectified linear activation function (ReLU) is used in deep neural networks and 
multi-layer neural networks, as a nonlinear activation function. In (6) is an illustration of this function. 


f(x) = max(0, x) (6) 


The ReLu function’s derivative for positive input is 1, which makes it faster than traditional activation 
functions in accelerating deep neural network training. Deep neural networks do not need extra time in 
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the training phase computing error terms due to constants. As the number of layers increases, the ReLu 
function does not produce the vanishing gradient problem. This function has no asymptotic upper and 
lower limits. 

e) Softmax function: the softmax function converts a K-dimensional vector of real values into a K-K-dimensional 
vector of probabilities that sum to one. It converts scores in numerous neural networks into a plausible 
distribution that may be understood and used as input for other systems. It is applied to the last layer’s 
output see (7). 


e7i 


2); sa Hy (7) 


j=0 © 


Where N represents the classes, z is the input vector and o(Z) is the output class possibility. 

f) Stride: a component of neural networks that have been adjusted to compress image and video data, such as 
CNNs. The filter stride parameter of the neural network controls how much movement occurs throughout 
the picture or video. If the stride is set to 1, the filter advances one pixel or unit at a time. Since the encoded 
output volume depends on the filter size, the stride is frequently set to a full integer rather than a fraction 
or decimal. 

g) Padding: padding in CNNs refers to adding additional pixels to an image’s edges for more precise analysis. 
The most typical type of padding is zero padding and the quantity of padding can be altered. This method 
widens the processing window fora CNN and makes it possible to recognize features with greater accuracy. 

h) Flatten is the output of the preceding layers into a single vector that may be utilized as an input for the 
following layer. 

The proposed CNN model will be presented in further detail and illustrated in Figure 2, which will 
detail its 27 layers: 1) eight convolution layers for feature extraction of kind of 1D, ii) eight LeakyReLU 1D 
layers, iii) seven maxpooling 1D layers, iv) three fully connected layers are represented by the (dense), and 
v) one flattened layer. 


) Munanin. 
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Figure 2. The proposed CNN model 


7. EVALUATION METRICS 
True positive (TP) is the accurate classification of the positive class. For example, if a model correctly 
isolates the cancer component of a picture, including malignant cells, the resulting classification determines 
the existence of cancer. True negative (TN) is the correct categorization of the negative class; for example, the 
model after classification claims that no cancer is present, even if none is evident in the image. False positives 
(FP) are erroneous positive predictions; for example, the model may categorize a picture as not having cancer 
while it includes harmful cells. False negatives (FN) are inaccurate predictions; for example, a model may 
predict a picture is malignant even when it contains no malignancy. 
a) Precision: precision indicates the number of TP divided by the number of TP and FP, or in our case, persons 
mistakenly classified as terrorists by the model. FP occur when the model correctly defines something as 
positive but negative [25]. In (8) shows that. 


bed TP 
Precision = —— (8) 
TP+FP 


TELKOMNIKA Telecommun Comput El Control, Vol. 22, No. 1, February 2024: 129-137 


TELKOMNIKA Telecommun Comput El Control Oo 135 


b) Recall: the degree of precision indicates the percentage of data points this model correctly identified as 
relevant. The capacity to locate every pertinent example in datasets is known as precision [26]. In (9) 
shows that. 


TP 
TP+FN 


(9) 


Recall = 


c) Fl-score: numerous results are returned by systems with great recall but low precision. In contrast to the 
training labels, most of its projected labels will be false. The opposite is true for systems that have low 
recall and great precision. It will give too few results, but most projected labels will match training labels 
accurately. A single number that describes how well a system operates can be important for assessing its 
effectiveness. This can be done by estimating the Fl-score associated with the method, defined as the 
harmonic mean of the recall and precision ratios. The Fl-score, which considers how comparable the 
two results were, could be viewed as the ‘’average” between the two [27]. In (10) shows that. 


Precision*Recal 
F1 — score = 2 x—————_ (10) 


Precision+Recall 


d) Accuracy: it is possible to analyze an algorithm using test data and divide the test predictions into four 
sets. The TP observation was positive and is expected to be positive regarding classification, whereas the 
TN observation was negative and is expected to be negative. FP were shown to be negative even though a 
positive outcome was anticipated. FN were observed to be positive even though they were anticipated to 
be negative. A classification accuracy rate is calculated by dividing the number of right predictions by the 
total number of predictions [28] as shown in (11). 


TP+TN 


Accuracy = ——————_ 
y (TP+TN+FP+FN) 


(11) 


8. RESULTS 

The results section presents the evaluation metrics and analysis of the proposed DL approach for 
classifying MSC. Table 1 summarizes the model’s accuracy, precision, recall, and Fl-score performance. 
Figure 3 presents the confusion matrix that shows the number of TP, TN, FP, and FN for each class. The 
following sections provide a detailed discussion of the results and their implications. 


Table 1. The experimental results of implemented DL 
ClassPrecisionRecallF1-scoreAccuracy 
O0 99.9 99.9 99.9 99.9 
1 99.9 999 999 999 
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Figure 3. Results of implemented DL 
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9. DISCUSSION 

Based on the provided Table 1 and Figure 3, it appears that the model achieved perfect precision, recall, 
and F1-score for both classes (melanoma and non-melanoma), indicating that it performed very well on the test 
datasets. The accuracy is also perfect. Meaning the model correctly classified all the test samples. We will 
make a comparison between the related works that we mentioned at the beginning of the paper and refer to its 
sources with the system that we proposed in this research paper, as shown in the following Table 2, depending 
on whether the only evaluation measure is the percentage of performance as shown Table 2. From the 
preceding, it became clear that the proposed system in this research paper has outperformed all previous works 
in the classification of skin cancer, as it classified the disease with a very high accuracy that no system had 
previously accessed, as it classified body cancer with a very high and ideal accuracy. 


Table 2. Comparison between the related works and the proposed system 


Class Precision 
Rezvantalab et al. [5] 98.16 
Rehman et al. [6] 98.32 
Charan et al. [7] 88.6 
Molina-Molina et al. [8] 97.35, 91.61, and 66.45 respectively 
Jinnai et al. [9] 91.5 
Pomponiu et al. [10] 93.62 
Esteva et al. [11] 96 
Pham et al. [12] The AUC was attained at 89.2, the ACC at 89.0, and the AP at 73.9 
Proposed system 99.9 


10. CONCLUSION 

In conclusion, the proposed DL approach for classifying MSC using a CNN model with 27 layers 
shows promising results. The CNN model is carefully designed to extract features from skin lesion images and 
classify them into melanoma and non-melanoma classes. The use of multiple convolution layers, batch 
normalization layers, max-pooling layers, fully connected layers, dropout layers, and data augmentation 
techniques contributes to the accuracy and generalization of the model. The experimental findings on publicly 
accessible benchmark datasets for skin lesion classification reveal that the proposed CNN model outperforms 
existing state-of-the-art approaches. In summary, the proposed DL approach using a CNN model with 27 
layers can potentially improve the accuracy and efficiency of skin lesion classification. It can be applied in 
clinical settings to assist dermatologists in early MSC detection. 
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